Totally data-driven duration modeling based on generalized linear model for Mandarin TTS
نویسندگان
چکیده
This paper proposes a totally data-driven duration modeling method for Mandarin TTS, which uses Generalized Linear Models (GLM) to model duration and stepwise regression to automatically select the attribute set with statistical measurements. This method can get a good tradeoff between model complexity and goodness of fit. Besides, speaking rate is introduced as a new modeling attribute, which not only achieves higher performance but also provides a novel approach to adjust speaking rate when synthesizing. We also propose to use R to fairly evaluate the modeling performances on different databases, since R refers to the fraction of corresponding variance explained by a model. Experiments show the performance of GLM is significantly higher than that of CART. With our much smaller models and corpus, the proposed method also achieves comparable results reported by other excellent researches.
منابع مشابه
A Unified Totally-Data-Driven Framework for Duration and Intonation Modeling
This paper proposes a unified framework for duration and intonation modeling in Mandarin TTS. In this framework, we design a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. By this representation, we can decompose F0 vector into 3 orthogonal polynomial parameters that are continuous scalars. Based on this vector-to-scalar decomposition, we ca...
متن کاملThe Toshiba Mandarin TTS System for the Blizzard Challenge 2008
This paper describes the Toshiba Mandarin Text-to-Speech (TTS) system that was submitted to the Blizzard Challenge 2008. The front-end of the system uses machine-learning approaches such as generalized linear models (GLM) and Quantification Method Type 1 (QMT1) to predict pause, duration and F0 contour. According to the predicted prosody information, the back-end of the system uses Toshiba’s ow...
متن کاملTotally data-driven intonation prediction model using a novel F0 contour parametric representation
This paper proposes a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. The polynomial is a simplified representation of Parallel Encoding and Target Approximation (PENTA) intonation model that includes a target component and an approximation component. We also propose predicting the polynomial parameters from linguistic and phonetic attributes...
متن کاملPhrase break prediction using logistic generalized linear model
In this paper we propose a novel phrase break prediction model for Mandarin speech synthesis. It is generalized linear models (GLM) with stepwise regression solution. We assume phrase break obeys Bernoulli distribution and then model phrase break probability by Logistic GLM. The attribute set is automatically selected by stepwise regression, which is a totally data-driven method. We also introd...
متن کاملModeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006